-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynamic buffer calc] Support dynamic buffer calculation #6194
[Dynamic buffer calc] Support dynamic buffer calculation #6194
Conversation
1. add tables required for the feature: - ASIC_TABLE in platform/mellanox/asic_table.j2 (Mellanox specific) - PERIPHERAL_TABLE in platform/mellanox/peripheral_table.j2 (Mellanox specific) - PORT_PERIPHERAL_TABLE on a per-platform basis in device/mellanox/x86_64-mlnx_msn3800-r0/port_peripheral_config.j2 (Mellanox specific) - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2 - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2 2. copy the newly introduced j2 files into image and rendering them when system starts 3. update the CLI options for buffermgrd so that it can start with dynamic mode 4. Adjust the order in which swss daemons start, making buffermgrd start before orchagent. This is to make sure buffermgrd can get everything ready before orchagent starts especially during warm reboot. 5. Optimize the way in which orchagent fetches the asic vendor name: - fetch the vendor name when creates the docker and pass it as a docker environment variable - orchagent and buffermgrd can use this passed-in variable 6. Clear buffer related tables from STATE_DB when swss docker starts 7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2 8. Remove buffer pool sizes for ingress pools and egress_lossy_pool Update the buffer settings for dynamic buffer calculation Signed-off-by: Stephen Sun <stephens@nvidia.com>
1. If all the buffer configuration aligns the default, use dynamic buffer calculation mode. Otherwise, use the traditional mode 2. Dynamic mode is adopted in switches newly installed from scratch and traditional mode in switches installed from ninigraph This is done by: - introducing a wrapper to buffermgrd, which enables it to test which mode is adopted and feed buffermgrd with corresponding CLI arguments - preparing default buffer configuration templates for both dynamic and traditional mode for each SKU - introduce "buffer_model" in DEVICE_METADATA|localhost for mellanox platform Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- For Mellanox, dynamic by default - For other vendors, traditional mode. Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
…able Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
fi | ||
if [ ! -f /etc/sonic/peripheral_table.json ] && [ -f /usr/share/sonic/device/$PLATFORM/port_peripheral_config.j2 ]; then | ||
sonic-cfggen -d -t /usr/share/sonic/device/$PLATFORM/port_peripheral_config.j2 > /etc/sonic/peripheral_table.json | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this block into inside swss container? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the asic table and peripheral table won't be changed on-the-fly, I'd like to have them called once only when the docker is created instead of called each time the swss docker starts.
Let me know if there are other concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems here is called when the docker swss is started, not created.
You already check the files existing, so not too much time wasted after file created.
In reply to: 541481038 [](ancestors = 541481038)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
Let's move to buffermgrd.sh
. Put the code generating asic_table.json
and peripheral_table.json
just ahead of the place where it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just investigated and realized it’s will be very difficult to move this part into swss docker because it requires asic_table.j2
and peripheral_table.j2
to be copied in docker swss. However, it’s defined on a per vendor basis and in platform/<vendor> dir, which makes it difficult to copy into docker in the Dockerfile.j2
.
So we have to have this part in docker_image_control.
Meanwhile, I also noticed that the PLATFORM isn't defined in this function. This can be achieved by moving this part into start()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding "it’s defined on a per vendor basis and in platform/ dir, which makes it difficult to copy into docker in the Dockerfile.j2"
I checked docker inspect swss
and found /usr/share/sonic/device/$PLATFORM/
is already mapped into swss container.
{
"Type": "bind",
"Source": "/usr/share/sonic/device/x86_64-mlnx_msn2700-r0",
"Destination": "/usr/share/sonic/platform",
"Mode": "ro",
"RW": false,
"Propagation": "rprivate"
},
Mounting is happened after running a container, so I did not propose modify Dockerfile.j2
, instead, modify dockers/docker-orchagent/docker-init.sh
.
I think you can treat asic_table.j2
the same way as port_peripheral_config.j2
. Why they are copied into different folders?
In reply to: 541513109 [](ancestors = 541513109,541481038)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The port peripheral config is defined on a per platform basis because each platform has its unique configuration. But the asic_table.j2 is shared among all platforms of the same vendor. It should not be copied to each of the platform folders.
@@ -243,6 +250,11 @@ start() { | |||
echo "Creating new ${DOCKERNAME} container with HWSKU $HWSKU" | |||
{%- endif %} | |||
|
|||
{%- if docker_container_name == "swss" %} | |||
# Obtain the vendor name | |||
ASIC_VENDOR=`$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this into inside swss container? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as the previous one. I'd like call this once during docker creation instead of many times.
@@ -1,6 +1,7 @@ | |||
{ | |||
"DEVICE_METADATA": { | |||
"localhost": { | |||
"buffer_model": {% if default_buffer_model == "dynamic" %}"dynamic"{% else %}"traditional"{% endif %}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
traditional [](start = 90, length = 11)
dynamic vs static? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic
/ traditional
is the term we already agreed on, isn't it?
@@ -392,6 +392,19 @@ sudo cp $BUILD_TEMPLATES/buffers_config.j2 $FILESYSTEM_ROOT_USR_SHARE_SONIC_TEMP | |||
# Copy the qos configuration template | |||
sudo cp $BUILD_TEMPLATES/qos_config.j2 $FILESYSTEM_ROOT_USR_SHARE_SONIC_TEMPLATES/ | |||
|
|||
# Copy the templates for dynamically buffer calculation | |||
{% if sonic_asic_platform == "mellanox" or sonic_asic_platform == "vs" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sonic_asic_platform [](start = 43, length = 19)
What is the future expectation on other platform?
If nothing, why need it for vs? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe other vendors will support the dynamic model in the future.
Meanwhile, we need this to verify this feature on vs platform.
By default, the dynamic model is disabled on vs platform and is enabled when the test cases of this feature are executed.
src/sonic-config-engine/tests/sample_output/py2/buffers-dell6100.json
Outdated
Show resolved
Hide resolved
src/sonic-config-engine/tests/sample_output/py3/buffers-dell6100.json
Outdated
Show resolved
Hide resolved
src/sonic-config-engine/tests/sample_output/py2/buffers-dell6100.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As comments
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- remove vs related stuff from Mellanox specific asic_table.json and peripheral_table.json - fix minor errors Signed-off-by: Stephen Sun <stephens@nvidia.com>
{%- if DEVICE_METADATA is defined and DEVICE_METADATA['localhost']['platform'] is defined %} | ||
{%- set platform = DEVICE_METADATA['localhost']['platform'] %} | ||
{%- else -%} | ||
{%- set platform = "vs-platform" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue, move vs
related part into platform/vs/peripheral_table.j2
. Please don't mix platforms into mellanox
folder. #Closed
Remove vs stuff from Mellanox related files and fix indent Signed-off-by: Stephen Sun <stephens@nvidia.com>
retest vs please |
@@ -31,3 +31,5 @@ $(SYNCD)_RDEPENDS += $(MLNX_SAI) | |||
|
|||
# Inject mlnx sdk libs to platform monitor | |||
$(DOCKER_PLATFORM_MONITOR)_DEPENDS += $(APPLIBS) $(SX_COMPLIB) $(SXD_LIBS) $(SX_GEN_UTILS) $(PYTHON_SDK_API) $(APPLIBS_DEV) $(SX_COMPLIB_DEV) $(SXD_LIBS_DEV) $(SX_GEN_UTILS_DEV) | |||
|
|||
export SONIC_BUFFER_MODEL=dynamic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change build time config in component makefile, if you need, change the pipeline command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Qi,
Yes, it works to change the CLI option but what's the benefit to change a working code?
- Why I did it
To support dynamic buffer calculation.
This PR also depends on the following PRs for sub modules
- How I did it
buffer_model
inDEVICE_METADATA|localhost
to represent which buffer model is running in the system currently:dynamic
for the dynamic buffer calculation modeltraditional
for the traditional model in which thepg_profile_lookup.ini
is usedbuffermgrd
can use this passed-in variableUpdate the buffer settings for dynamic buffer calculation
Signed-off-by: Stephen Sun stephens@nvidia.com
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)